Spatiotemporal evolution characteristics and prediction analysis of urban air quality in China

To describe the spatiotemporal variations characteristics and future trends of urban air quality in China, this study evaluates the spatiotemporal evolution features and linkages between the air quality index (AQI) and six primary pollution indicators, using air quality monitoring data from 2014 to 2022. Seasonal autoregressive integrated moving average (SARIMA) and random forest (RF) models are created to forecast air quality. (1) The study’s findings indicate that pollution levels and air quality index values in Chinese cities decline annually, following a “U”-shaped pattern with a monthly variation. The pollutant levels are high in winter and low in spring, and low in summer and rising in the fall (O3 shows the opposite). (2) The spatial distribution of air quality in Chinese cities is low in the southeast and high in the northwest, and low in the coastal areas and higher in the inland areas. The correlation coefficients between AQI and the pollutant concentrations are as follows: fine particulate matter (PM2.5), inhalable particulate matter (PM10), carbon monoxide (CO), nitrogen dioxide (NO2), sulfur dioxide (SO2), and ozone (O3) values are correlated at 0.89, 0.84, 0.54, 0.54, 0.32, and 0.056, respectively. (3) In terms of short-term AQI predictions, the RF model performs better than the SARIMA model. The long-term forecast indicates that the average AQI value in Chinese cities is expected to decrease by 0.32 points in 2032 compared to the 2022 level of 52.95. This study has some guiding significance for the analysis and prediction of urban air quality.

an important theoretical basis for relevant government departments to conduct prevention and control policies. These steps would help China actively respond to air pollution, rather than simply passively monitoring it.
The topic of monitoring, evaluating, and predicting ambient air quality conditions has been of great interest among scholars worldwide [1][2][3] . Air quality research in China mainly focuses in three areas. The first area of research includes air quality studies at different scales and in specific regions. For example, studies have analyzed the interannual variation characteristics of air quality in central and eastern China 4 , in typical northern cities 5 , and in typical towns in the north and south of the country 6,7 . Studies have also considered interannual variations in air quality 8 , have compared urban-rural air quality levels, and have analyzed air quality variations during significant festivals and events. The second area of research focuses on the factors influencing air quality. These factors are complex, and include pollutant factors 9,10 , Population density 11 , energy 12 , anthropogenic factors 13,14 , meteorological elements 15,16 , and socio-economic factors 17,18 . He et al. conducted a study using AQI, meteorological factors, and socio-economic data. That study found that climate conditions were the leading causes of air pollution in Hebei Province, while anthropogenic emissions were the primary factors contributing to severe air pollution in the same region 19 . The third area of research involves air quality prediction analysis, focusing on three main types of methods: latent forecasts 20 , numerical forecasts 21 , and statistical forecasts [22][23][24] . Statistical forecasting predicts future trends by analyzing statistical patterns of input-output information related to air pollution. This approach has gained the attention of many researchers because of its quick and simple features. Finally, the integrated algorithm Random Forest (RF) is a new machine learning paradigm, and has become popular because of its advantages of good robustness and high prediction accuracy.
The models and methods used in previous studies on the spatiotemporal evolution characteristics of urban air quality in China are relatively mature. However, few studies have analyzed and predicted air quality for multiple cities across China and for a longer observation periods. In addition, previous research focused primarily on predicting AQI values at specific historical moments but did not incorporate historical concentration values of the six major pollutants into their prediction analysis. To address this topic, this study analyzes the daily AQI and data on six major air pollutants from May 2014 to August 2022 for 388 major cities in 31 provinces in China. The study analyzes the characteristics of the spatial and temporal distribution of air quality in Chinese cities, the changing trends, and the correlation between the major pollutants with significant effects. Moreover, historical AQI values and concentrations of the six major air pollutants were used as independent variables to establish SARIMA and RF models and predict future development of urban air quality related indicators in China. The study results provide a scientific basis for relevant atmospheric environment monitoring and air pollution control departments and may help inform measures to improve future air quality.

Materials and methods
Data source and data pretreatment. The air quality data used in this study are from the China General Environmental Monitoring Station, a platform that publishes real-time national urban air quality data. A total of 1,050,590 daily air quality data points are used for this study's analysis and modeling, representing data from May 13, 2014 through August 27, 2022, for 388 major cities in 31 provincial-level administrative regions in China (excluding Hong Kong, Macao, and Taiwan) in China. The available data include the AQI and concentrations of O 3 , PM 2.5 , PM 10 , SO 2 , NO 2 , and CO. The AQI is an essential comprehensive indicator reflecting the level of air quality of a city. It is calculated using the concentration of six principal pollutants and is correlated with the increasing severity of air pollution. In other words, larger AQI values indicate higher levels of air pollution, and smaller AQI values indicate lower air pollution levels. The AQI levels are divided into six grades, according to The Technical Provisions on Ambient Air Quality Index (for trial): excellent (0-50), good (51-100), mild pollution (101-150), medium pollution (151-200), heavy pollution (201-300), and serious pollution (301-500).
This study focuses on examining the spatiotemporal variation characteristics and trends of AQI using daily real-time and time-varying data. First, data are classified and summarized using the statistical analysis software PYTHON (Jupyter Notebook 6.3.0). The missing values are replaced using the average data of the corresponding cities.

Research methods. Correlation analysis and descriptive statistical analysis. Correlation analysis is widely
used to analyze air quality problems, and studies have shown that this approach can effectively identify the key factors influencing hazy weather and elevated PM 2.5 concentrations. Therefore, this study uses correlation analysis to investigate the correlation between AQI and the six major pollutant concentration indicators, with the goal of exploring the causes for these correlations based on extensive studies. In addition, this study also provides a descriptive statistical analysis of the annual and seasonal variations of urban air quality in China and the provincial and municipal distribution characteristics. This provides a basis for subsequent predictions.
SARIMA model. The analysis of time series decomposition reveals that monthly data on air pollution-related indicators in major Chinese cities exhibit both long-term and seasonal fluctuations. Furthermore, the six pollutant concentration indicators are significantly correlated with the AQI values for significant cities in China. There may also be correlations among the six major pollutants. This indicates that there is multicollinearity among all factors. This does not satisfy the condition of mutual independence, making direct linear regression analysis inappropriate. To address this issue, this study applies time series and random forest regression models to analyze and predict AQI to address whether the condition of mutual independence is violated for the data set. First, the SARIMA model is established based on data characteristics of previous AQI data, with the goal of predicting AQI data in 2022.
The general form of the SARIMA model is SARIMA(p, d, q)(P, D, Q) s , expressed as: www.nature.com/scientificreports/ where y t is the time series; µ t is a random term; � P (L) denotes the autoregressive characteristic polynomial; p denotes the autoregressive maximum lag; � q (L) denotes the moving average characteristic polynomial; and q denotes the moving average maximum lag. The term A P (L s ) is the seasonal autoregressive characteristic polynomial; s denotes the length of the seasonal period; P denotes the seasonal autoregressive maximum lag; B Q (L s ) denotes the seasonal moving average characteristic polynomial; Q denotes the moving average maximum lag; and d denotes the non-seasonal single integral order, which is the single integer difference. The term D s y t denotes the D times seasonal difference, and D denotes the order of the seasonal term, which represents the seasonal difference.
Random forest model. Past theoretical and empirical research has shown that AQI values in Chinese cities have clear spatial and temporal interactions. The magnitude of AQI values is influenced by the spatial interactions and by the cumulative effect of historical pollutant concentrations over time. This study establishes a random forest regression model to predict the AQI from a nonlinear perspective, combining different pollutant impact factors over time and using the six pollutant concentration indicators at historical moments as independent variables.
The random forest algorithm is a combinatorial model consisting of decision trees h i (x t ) . The regression tree takes the mean value based on each terminal node as the overall prediction result. Thus, for the sample x t ∈ R j , j is the number of features and the random forest h(x t ) is the average of the predicted results of all subtrees h i (x t ) , expressed as follows: where k is the number of decision subtrees. Before using the model for forecasting, we first evaluate the model's predictive performance. Model accuracy is generally determined using the mean absolute percentage error (MAPE), and root mean square error (RMSE), mean squared error (MSE), and mean absolute error (MAE). In addition, the goodness of fit (GOF) and explained variance score (EVS) are also commonly used to measure the strengths and weaknesses of forecasting methods. A combination of different parameters should be considered to measure the accuracy of the model's prediction performance, to ensure an effective modelling outcome.

Analysis of the results. Spatial and temporal evolutionary characteristics of urban air quality in China.
Annual analysis of air quality. The first step is to describe the overall distribution characteristics and trends of the daily average AQI values and the concentration values of the six major pollutants CO, NO 2 , O 3 , PM 10 , PM 2.5 , and SO 2 for Chinese cities from 2014 to 2022. Table 1 Table 1. The other six primary pollution concentrations also decreased year by year. Table 2 shows the classified daily air quality by grade according to the year. The urban air quality in China hit an "Excellent" level at the following percentages of days in the sequential nine years from 2014 through 2022: 75.00%, 78.97%, 82.24%, 83.74%, 86.73%, 88.67%, 91.41%, 91.36%, and 92.42%, respectively. This indicates an increasing trend year-by-year. The percentages of days exhibiting heavy and serious pollution for the same nine sequential years are 2.46%, 2.97%, 2.48%, 2.21%, 2.02%, 1.601%, 1.115%, 1.37%, and 0.98%, respectively. This shows a decreasing trend year-by-year. In general, the air quality of most cities is rated Excellent, followed by Good, with only a certain proportion of days reporting light pollution. There are even fewer days classified as having moderate pollution or above. Although the proportion of days with air pollution in Chinese cities has www.nature.com/scientificreports/ been decreasing in recent years, the proportion is not small, and air pollution still should be actively managed and controlled. The correlation coefficients between AQI and each of the following six pollutants, PM 2.5 , PM 10 , CO, NO 2 , SO 2 and O 3 , are 0.89, 0.84, 0.54, 0.54, 0.32, and − 0.056, respectively (Fig. 1). The pollutant O 3 is the only one with a negative correlation with AQI; all five other pollutants are positively correlated with AQI. Figure 1 shows that the increases in PM 10 and PM 2.5 concentrations are associated with the most significant increases in AQI. This may indicate that AQI is more sensitive to changes in particle concentration. Changes in ozone are mainly caused by solar radiation; as such, there is no strong correlation between changes in ozone concentration and changes in AQI. In addition, the correlation coefficients between the six pollutants, in particular between PM 2.5 and PM 10 , PM 2.5 and CO, and CO and NO 2 concentrations, exceeded 0.58. Lang Lijun et al. also found that PM was strongly correlated with NO 2 , CO and O 3 -8h 25 . This indicates there is multicollinearity among all factors, highlighting the complexity of the correlation.
Seasonal analysis of air quality. In the comparative analysis, four seasons are divided according to the Gregorian calendar. As such, spring, summer, autumn, and winter are denoted as being March to May, June to August, September to November, and December to February, respectively. Table 3 shows the mean value of the AQI and concentrations of the six pollutants in the different seasons; the table indicates that the AQI and six pollutants in Chinese cities show significant seasonal variation. This result closely aligns with the findings of Ji Mengyi et al. 15 . www.nature.com/scientificreports/ In particular, the AQI in winter during the heating period is generally higher, with an average AQI of 86.64 (mild pollution). The overall AQI is lower in summer, with an average AQI of 47.62 (good). The results show that the air quality in Chinese cities is the worst in winter and the best in summer, due to seasonal variation in both natural and human activities. In winter, there is less dry precipitation, low temperature, stable air pressure, and temperature inversion. These conditions do not facilitate pollutant diffusion and dilution. As the heating season begins, pollutant emissions increase, exacerbating air pollution. In spring and autumn, the weather is mostly windy and sandy, affecting the ambient air quality. In summer, precipitation increases, humidity is high, and localized convection over the city is strong. This facilitates the deposition, dilution, and diffusion of pollutants, improving air quality. Table 3 also shows that the PM 10 and PM 2.5 concentrations were highest in the winter season, and PM 2.5 , PM 10 , O 3 , and NO 2 were highest in the spring season as the air quality indexes. O 3 was highest in the summer season, likely because constant high temperatures and intense sunlight in summer tend to cause the photochemical reactions of nitrogen oxides and volatile organic compounds in vehicle exhaust and factory smoke emissions. This produces more ozone 26 28 . Figure 2 shows the monthly data trend distribution of AQI values. The image visually shows that AQI is specifically related to the month, and there is a certain periodicity in the distribution of the monthly AQI. The monthly average AQI in 2014 is significantly higher than values in subsequent study years, especially in April, June, August, and November. The monthly average AQI values for 2019-2022 are significantly lower compared to 2014. Overall, the monthly average AQI value decreased continuously from March to July, reaching its lowest value from the end of July to the beginning of August. The value then gradually increased to the highest value in February of the following year. The AQI in Chinese cities shows a monthly "U"-shaped pattern of being high in winter, decreasing in spring, and then being low in summer, and rising in autumn. Among the six pollutants, five pollutants show a "U-shaped" distribution; only O 3 has an "inverted U-shaped" distribution. This discovery Table 3. Data for air quality factors in different seasons.  www.nature.com/scientificreports/ provides valuable insights about the relationship between the air quality index and pollutants, which can inform the development of targeted air pollution control measures.
Provincial distribution of air quality. Figure 3 shows the spatial distribution of AQI in Chinese cities from 2014 to 2022. The results indicate a significant lack of equilibrium with respect to the spatial distribution of urban air quality in China. The air quality is poorer in China's central inland and northwestern regions, and is better in the southeastern coastal and highland areas. In general, the AQI of Chinese cities shows a spatial distribution pattern that is low in the southeast and high in the northwest, and low in the coast and high in the interior. These observations are largely consistent with the findings of Lin Xueqin and Wang Dai. (2016) 17 , as well as Wan Qing et al. (2022) 29 . This discovery holds significant reference value for gaining a comprehensive understanding of the regional disparities in urban air quality in China, and for conducting in-depth research into the root causes of air pollution. It also provides robust support for developing air pollution control strategies tailored to specific regions.
The AQI values of the 31 provinces are ranked and the ten provinces with the lowest AQI values are (ranked in order from lower to higher AQI values): Hainan, Xizang, Yunnan, Fujian, Guizhou, Guangdong, Heilongjiang, Guangxi, Qinghai, and Zhejiang. These ten provinces have satisfactory overall air quality and are free of air pollution. The ten provinces with the worst national air quality levels are (ranked in order from highest to lower AQI values): Henan, Xinjiang, Hebei, Tianjin, Shanxi, Beijing, Shandong, Shaanxi, Ningxia, and Hubei. The overall air quality of these 10 provinces is acceptable; however, some cities are more polluted than others, possibly impacting the health of susceptible people.
The primary pollutants in the ten provinces with the best air quality are PM 10 , PM 2.5 , and O 3 . The concentration levels of these three substances significantly influence the AQI values. This is particularly seen in the correlations between PM 10 and PM 2.5 and AQI, which exceed 0.94. Further, the correlation coefficient of the O 3 concentration on AQI reaches 0.78. The correlation coefficient between PM 2.5 and PM 10 reaches 0.9; PM 10 includes PM 2.5 , so an increase of PM 2.5 also increases the PM 10 concentration. The rise in PM 10 cannot be smaller than the increase in PM 2.5 concentration. As such, the correlation of 0.9 reflects reality. PM 10 and PM 2.5 are also the main pollutants in the ten provinces with the worst air quality.
Air quality municipal distribution. This study analyzes the air quality of 388 major cities in China based on the magnitude of AQI values. The ten cities with the best air quality are as follows (ranked in order of good to less good): Tibetan Autonomous Prefecture of Garzê, Linzhi, Danzhou, Sanya, Sansha, Tibetan Qiang Autonomous Prefecture of Ngawa, Yushu Tibetan Autonomous Prefecture, Qiannan Buyi and Miao Autonomous Prefecture, Altay Prefecture, and Diqing Tibetan Autonomous Prefecture. The ten cities with the worst air quality in the www.nature.com/scientificreports/ country are (ranked in order from poorest to better): Hotan Prefecture, Kashgar Prefecture, Aksu Prefecture, Kizilsu Kirghiz Autonomous Prefecture, Tulufan, Kuerle, Shijiazhuang, Anyang, Handan, and Xingtai. The main pollutants in the ten cities with the best air quality are PM 2.5 , PM 10 and NO 2 ; the correlation coefficients between these three pollutants and AQI are 0.76, 0.92, and 0.38, respectively. The correlation between PM 2.5 and PM 10 reaches 0.81; however, the other correlations among the six major pollutants are less than 0.37, and are not statistically significant. Figure 4 shows that CO, SO 2 , NO 2 and O 3 contribute little to the environmental air pollution of the ten most polluted cities. In contrast, PM 2.5 and PM 10 are the pollutant factors that most affect the environmental air quality of these cities. These pollutants are also closely correlated with urban air quality and provincial air quality. There is a strong positive correlation between PM 2.5 and PM 10 , at 0.9, indicating that the increase of PM 2.5 concentration accompanies the growth in PM 10 levels.
AQI prediction based on SARIMA model. Model parameter estimation. First, we plot the AQI time series from May 2014 to August 2022 and decompose the time series directly into the trend and seasonal residuals to test for smoothness (Fig. 5). Figure 5 shows significant fluctuations in the AQI values for China from 2014 to 2022. The series appears to have a time-based trend, with a general decrease each year, and with significant seasonal characteristics. This indicates it is a non-stationary series. Therefore, this study generates a smooth nonwhite noise series by performing ordinary and seasonal difference operations on the original data (Fig. 5c,d). The smoothness is tested using the Augmented Dickey-Fuller test (ADF) method. The results are shown in Online Resource 2. The ADF statistical test results indicate that the hypothesized test values for the t-test to assess seasonal differencing and first-order differencing are less than the three critical values of 1%, 5%, and 10%.
For the modeling, this study uses a combination of Bayesian information criterion (BIC) and Akaike information criterion (AIC) statistics to determine the optimal order of the model. The BIC statistic is minimized by selecting different combinations of p and q parameters for repeated experiments and by combining the results generated by automatic screening using Python software. The model is determined to be SARIMA(2, 1, 1)(0, 1, 1) 12 . The model parameters are provided in Online Resource 3.
Model fitting prediction. The SARIMA model equation is as follows: Figure 6 shows an overall good model fit, reflecting the trend of the monthly average AQI value for Chinese cities over a short time scale. The residual broken line diagram (Fig. 6b) indicates that the model is accurate, with some fluctuation in the residual difference between the predicted value and actual value. This trend is affected by the season. The deviation between the predicted and actual values may be due to inevitable errors in fitting the SARIMA model, based on the assumption there are no significant changes in other influencing factors. For example, the predicted value for February 2022 is slightly larger than the actual value, perhaps because the model does not consider the ban on fireworks during the traditional Chinese New Year.
A white noise test is performed on the residual series of the model to determine the model's fitness. If the residual series falls within a white noise series, the model is considered to effectively explain the time series. Otherwise, the model needs to be further improved. The QQ chart in Fig. 6c shows that the residual series is normally distributed. The residuals pass the white noise test, indicating the extraction of useful information in the time series. The rest reflects random perturbation, which cannot be predicted and used. Therefore, the predicted values of the monthly AQI obtained from the model SARIMA(2, 1, 1)(0, 1, 1) 12 are closer to the actual situation, and the established model has an excellent fitting effect.
Prediction of AQI values based on random forest model. Importance of random forests to assess pollution factors. The random forest algorithm is capable of predicting air quality from a non-linear approach, and www.nature.com/scientificreports/ can be used to both quantitatively and qualitatively analyze the specific relationships between the impact factors of pollutants and air quality and their degree of influence on AQI. To explore the importance of the six main pollutants, this study uses the constructed random forest model to select the important features of the pollutants affecting air quality. This study uses the air quality grades from May 2014 to August 2022 as type variables. The AQI values and pollution factor data in the test set were entered into the trained RF prediction model to obtain the relative importance of each air pollutant concentration index. The relative importance when comparing concentrations of the six significant pollutants, PM 10 , PM 2.5 , CO, SO 2 , NO 2 , O 3 , and the AQI values are 39.69%, 32.28%, 13.04%, 8.80%, 5.37%, and 0.82%, respectively. The random forest model shows that PM 2.5 and PM 10 are the top two indicators that most significantly influence the AQI value. These are followed by CO, SO 2 , and NO 2 . These results are consistent with the results of the correlation coefficient analysis.
Forecast analysis of the random forest model. This study uses the average values of historical time-specific concentrations of six major pollutants (PM 2.5 , PM 10 , O 3 , NO 2 , CO, and SO 2 ) from May 2014 to December 2021 as independent variables. The AQI values calculated from these pollution factors are used as dependent variables to construct a random forest model to predict AQI values for Chinese cities in 2022. Figure 7 shows the results. Figure 7 shows that the predicted values are very close to the measured values, indicating a consistent trend and high prediction accuracy. However, certain factors (such as a sharp fall of temperature) cause a certain number of abnormal fluctuations in AQI. Because the random forest does not contain information about those factors, a certain amount of error is expected between the predicted value and the actual value.
A white noise test is performed on the residual sequence of the model to estimate the model's suitability. The residual QQ shown in Fig. 7c indicates that the residual sequence passes the white noise test. The R 2 of the random forest model is 97.61%; the MAE is 1.3841; the MAPE is 0.0228; and the EVS is 97.65%. This further indicates that the prediction accuracy is within a reasonable range and the model achieves a good fitting effect. In general, the variation trends with respect to the predicted and observed AQI values are highly consistent. This supports the conclusion that the regression model established using the RF algorithm performs well in predicting the AQI value.  Long-term scale forecasting helps analyze the air quality trends and patterns from a macroscopic perspective. Therefore, after verifying the feasibility and validity of the two models, this study applies the random forest model to develop long-term forecasts of the AQI and concentrations of the six study pollutants. The prediction results indicate that the average value of AQI in the next ten years is expected to be 51. Of these, PM 10 , NO 2 and ozone are expected to decrease most significantly. The forecast results indicate that the average air quality in Chinese cities is projected to further improve in the future. This is also consistent with the efforts of the government and people to improve air quality and control air pollution. The projections also indicate that the sharp decrease in pollutant concentrations, particularly with respect to aerosol particulate matter, may lead to a reduction in the cooling effect of particulate matter. This may hinder the expected mitigation of global warming. Therefore, it would be more appropriate to implement coordinated emission reduction measures that target both greenhouse gases and air pollutants, to achieve the goal of reducing global emissions.
Ethics approval. This is an observational study.  www.nature.com/scientificreports/ Ethical responsibilities. All authors have read, understood, and have complied as applicable with the statement on "Ethical responsibilities of Authors" as found in the Instructions for Authors and are aware that with minor exceptions, no changes can be made to authorship once the study is submitted.

Conclusions and discussion
This research studies the temporal and spatial distribution characteristics of AQI and six major pollutants, using statistical analysis and correlation analysis methods, and time-based air quality monitoring data for 388 cities in 31 provinces of China from 2014 to 2022. The future air quality of Chinese cities is predicted using the SARIMA and random forest models. There were three key study findings: 1. There is a considerable downward trend in the AQI value and pollution concentration of Chinese cities overall across the study years. The AQI exhibits a "U"-shaped monthly trend that is high in winter and decreasing in spring, and low in summer and increasing in autumn. Summer generally has the best air quality and winter generally has the worst air quality (the pollutant O 3 shows the opposite trend). Air quality in Chinese cities is spatially distributed as low in the southeast, high in the northwest, and low on the coast, and high in the interior. 2. Results indicate PM 2.5 and PM 10 are the principal pollutants in the provinces and cities in China with the worst air quality. Provincial and local authorities should pay close attention to SO 2 , CO, and NO 2 emissions while concentrating on preventing and reducing PM 2.5 and PM 10 pollution emissions in the air. Pollution control practices should adhere to the principle of "prevention-oriented, combined with prevention and control" to promote the maintenance and continuous improvement of air quality. These pollutants are mainly caused by emissions from the burning of fossil fuels. As such, to mitigate and control air pollution, cities should adopt regional mitigation strategies to address air pollution in a coordinated manner. Actions taken by any single city to prevent and control air pollution are unlikely to be effective in a regional collection of heavily polluted cities. This highlights that air pollution management should not be restricted to a single  www.nature.com/scientificreports/ city, and that a joint air pollution prevention and control approach is needed across administrative regions. Ultimately, an international system is needed to prevent and manage air pollution. 3. This study evaluates the importance of six significant pollutant variables on AQI using the random forest model. The results show that PM 10 and PM 2.5 remain the two pollutant indicators with the most critical influence on AQI. This is consistent with the results of the correlation analysis. Predicting the future AQI is a complex multivariate nonlinear problem, and both the SARIMA and RF models can predict AQI better than other models The prediction accuracy of the RF model is higher of the two, and the six pollutants' historical moment concentration variables may be more suitable than the AQI variables for air quality prediction with respect to the model training set. Experience has shown that environmental protection measures, such as road watering and a ban on lighting fireworks, have effectively controlled coarse particles and have successfully reduced particle concentrations, such as PM 10 and PM 2.5 . It is also largely accepted that NO 2 , CO, and SO 2 generally come from fuel ignition and engine vehicle fumes. In the future, the diminishing of these pollutant concentrations may mirror general commitment levels with respect to energy-saving and decreased emission approaches, such as the advancement of new energy vehicles in urban communities in the following 10 years.
This study's statistical analysis and modeling methods have guiding significance for studies concerning air quality's spatial and temporal evolution characteristics and future prediction. However, there remain many shortcomings and areas worth further research. When modeling the AQI influence factor analysis, this study did not consider the influence of meteorological elements, future economic development level, industrial structure, population change, and a series of policy interventions. Follow-up studier should consider the influence of more factors on air quality in China. In addition, applying a statistical-based approach is needed as an active research topic to establish the link between pollutant concentrations and AQI to predict air quality in future periods. Statistical methods are essentially based on historical data to make forecasts; as such, they have a significant advantage in multi-frequency short-term forecasting because the computational effort of statistical methods is several orders of magnitude smaller than required for numerical methods. However, the disadvantage of the statistical approach is that it requires a large amount of historical air quality data as the basis for model training to improve the prediction accuracy. With the advent of the Big Data era, traditional regression models are becoming obsolete, and machine learning-an interdisciplinary field of statistics and computer science is flourishing due to increased computing power. Studies such as Feng et al. 's work on using wavelet transform and artificial neural networks to predict PM 2.5 highlight the potential of combining physical models and machine learning in air quality prediction 30 . The random forest algorithm is a prominent machine learning algorithm that is expected to evolve further and become a hot topic in big data processing algorithm optimization.
In closing, it is important to note that decreasing pollutant concentrations, especially the mass concentration of aerosol particles (PM), may reduce the cooling effect of the particulate matter. This may complicate the overall effort to mitigate global warming. Despite this, the temperature change caused by the sudden reduction in pollutant concentration is relatively small, and it is urgent to reduce greenhouse gases and air pollution around the world.

Data availability
The datasets analyzed for this study are located in the real-time national urban air quality release platform of the China General Environmental Monitoring Station. [https:// air. cnemc. cn: 18007/].